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ABSTRACT 

About one-fifth of the genes in the budding yeast 
are essential for haploid viability and cannot be 
functionally assessed using standard genetic 
approaches such as gene deletion. To facilitate 
genetic analysis of essential genes, we and others 
have assembled collections of yeast strains ex- 
pressing temperature-sensitive (ts) alleles of essen- 
tial genes. To explore the phenotypes caused by 
essential gene mutation we used a panel of genet- 
ically engineered fluorescent markers to explore the 
morphology of cells in the ts strain collection using 
high-throughput microscopy. Here, we describe the 
design and implementation of an online database, 
PhenoM (Phenomics of yeast Mutants), for storing, 
retrieving, visualizing and data mining the quantita- 
tive single-cell measurements extracted from 
micrographs of the ts mutant cells. PhenoM allows 
users to rapidly search and retrieve raw images and 
their quantified morphological data for genes of 
interest. The database also provides several 
data-mining tools, including a PhenoBlast module 
for phenotypic comparison between mutant strains 
and a Gene Ontology module for functional 



enrichment analysis of gene sets showing similar 
morphological alterations. The current PhenoM 
version 1.0 contains 78194 morphological images 
and 1 909914 cells covering six subcellular compart- 
ments or structures for 775 ts alleles spanning 491 
essential genes. PhenoM is freely available at http:// 
phenom.ccbr.utoronto.ca/. 

INTRODUCTION 

Essential genes are indispensable for the survival of an 
organism and are typically involved in fundamental bio- 
logical processes, such as cell wall and membrane biogen- 
esis, ribosome biosynthesis, DNA replication and 
cytoskeletal functions (1). As one of the most thoroughly 
characterized model organisms, the budding yeast 
Saccharomyces cerevisiae is frequently used to study 
genes involved in conserved biological pathways. In 
S. cerevisiae, ~18% of the genes are considered to be es- 
sential for growth on rich medium with glucose as the 
carbon source (1,2). Essential genes are evolutionarily 
more conserved between yeast and humans than 
non-essential genes, indicating that a deeper understanding 
of the functions of yeast essential genes will extend our 
understanding of human gene function (3). Previous 
studies have characterized functions of yeast essential 
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genes using a variety of alleles which allow conditional or 
partial inactivation of essential gene function. Many 
mutant strains have been constructed for genetic analysis 
of essential genes in the past several years, including those 
harbouring tetracycline (tet)-repressible promoter alleles 
(1), decreased abundance by mRNA perturbation alleles 
(4) or temperature-sensitive (ts) alleles (5,6). Among these 
methods, ts alleles are considered the most powerful for 
genetic analysis as they can perturb specific aspects of es- 
sential gene function (5-8). 

Temperature-sensitive alleles of essential genes allow 
cell viability at the permissive temperature since the 
protein encoded by the ts allele resembles that encoded 
by the normal or wild-type allele and the mutant 
cells are often morphologically indistinguishable from 
the wild-type cells. However, at the restrictive (non- 
permissive) temperature, the ts gene product is defective 
and may result in a loss-of-function. Consequently the 
cells will show growth or morphological defects at the re- 
strictive temperature. We have recently developed a 
method to quantify cell morphological changes for thou- 
sands of mutants in parallel by coupling synthetic genetic 
analysis with high-content screening (SGA-HCS) (9,10). 
We used SGA-HCS to study essential yeast genes and 
recently published a phenotypic profiling of 497 yeast 
strains carrying ts conditional alleles of essential genes (6). 



To facilitate access to these valuable single-cell morpho- 
logical data including the raw images, we implemented a 
web server with an integrated database and the capability 
to allow visualization and analysis of morphological data. 
PhenoM (Phenomics of Yeast Mutants) is a web-based 
platform that contains quantitative single-cell measure- 
ments and morphological images of yeast cells carrying 
ts alleles in essential genes generated by HCS technology 
(10-13) [reviewed in (9)]. Figure 1 shows a detailed record 
for allele smc4-l, which describes the basic information of 
the gene and phenotypic images. We have developed a 
user-friendly system with a fast and accurate search 
engine embedded with two data mining tools, PhenoDev 
and PhenoBlast. PhenoDev is a set of tools for calculating 
phenotypic differences between two mutant strains or 
between a mutant and a wild-type strain grown at the 
same or different temperatures. These tools can be used 
to identify genes that, if compromised, can give rise to a 
specific abnormal phenotype of interest; conversely, for a 
specific ts allele, these tools can also quickly identify 
the most significant phenotypic changes associated 
with the temperature shift. In contrast, the other tool, 
PhenoBlast, serves to quantify the morphological similar- 
ity between multiple ts alleles. This tool can be best used to 
discover other ts alleles which cause abnormal phenotypes 
similar to a ts allele of interest. Our system also provides 



PhenoM < - 






^ — ^ 

PfiGnomics of VGsst Mut3tits 



Data Mining 



Help 



Detailed Information of the Allele 



Allele Name 


smc4-1 


Gene Name 


SMC4 


Oif 


YLR086W 


Aliases 




Type 


mutant / 


Description 


Subunit of the condensin complex; reorganizes chromosomes during cell division; forras* a 
complex with Smc2p that has ATP-hydrolyzing and DNA-binding activity; required for SNA gene 
clustering at the nucleolus; potential Cdc28p substrate / 


Links 


[Entrez-Gene} [GeneDB] [Germ online} [SGD] [CYGDJ [BioGBtfb] 


Download 


[Download All Data of the Allele smc4-1] / 





Temperature 


Plasma 
membrane 


26 




32 




Phenotypic images 



Compartment [Actin] [DNAdamage] [Nucleus] [Mitochondria] [ Plasn-ur membrane ] [Mitotic Spindle] 



[Download Trf Image] 
[View image by imageJ] 
[View Dataset of Image] 



[Download Trf Image] 
[View Image by ImageJ] 
[View Dataset of image] 



[Download Trf image] 
[View Image by ImageJ] 
[View Dataset of image] 



[Download Trf Image] 
[View Image by ImageJ] 
[View Dataset of Image] 




Figure 1. An example of the allele detail interface based on allele smc4-l. The interface includes basic information of the allele and phenotypic 
images of the allele in different subcellular compartments or structures. 
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tools for Gene Ontology (GO) enrichment analysis to find 
common GO annotation(s) among the top-ranked 
mutants identified by PhenoDev or PhenoBlast. 

Currently, PhenoM version 1.0 holds 78 194 morpho- 
logical images of 1909 914 cells, covering six yeast 
subcellular compartments or structures, including actin fila- 
ments, DNA damage foci, mitochondria, nucleus, plasma 
membrane and mitotic spindle. These data were assembled 
by studying strains carrying 775 different ts alleles spanning 
491 essential genes. To our knowledge, PhenoM is the first 
database of phenotypes caused by mutation of yeast essen- 
tial genes that contains and analyses quantitative data 
derived from single-cell morphological images. PhenoM 
should be a valuable resource for the yeast and genomics 
communities since the large volume of data and the efficient 
analysis tools will facilitate the study of essential genes 
function. Images are also available in a downloadable 
format to facilitate future re-analysis using more 
sophisticated software developed by us or by other compu- 
tational experts. 

DATA SOURCE 

The PhenoM database contains morphological images and 
quantitative single-cell measurements of yeast strains 
carrying 775 different ts alleles associated with 491 essen- 
tial genes, covering the following six subcellular compart- 
ments or structures: actin filaments, DNA damage foci, 
mitochondria, nucleus, plasma membrane and mitotic 
spindle. As described in our previous report, MATu 
query strains carrying different fluorescent markers were 
mated to the ts allele array and MATa haploid ts strains 
expressing different Green Fluorescent Protein (GFP) and/ 
or Red Fluorescent Protein (RFP) fusion proteins were 
isolated using SGA technology (6,10,14). The following 
cellular markers were used: a plasma membrane marker 
(Psrlp-GFP), a reporter of DNA damage (Ddc2p-GFP), 
a nuclear marker (Mad lp- Nuclear localization Signal- 
RFP), a mitotic spindle reporter (GFP-Tubl), a mitochon- 
drial marker (OM45p-GFP) and an actin reporter 
(Sac6p-GFP). Morphological images were generated by 
using the SGA-HCS protocol (9,15) and quantitative 
measurements of a particular morphological feature were 
extracted at the single-cell level. For every reporter, several 
independent images were captured at different tempera- 
tures and these are referred to as 'sites' in the database. 
The quantitative measurements of each cell were 
categorized by several factors such as the temperature at 
which the yeast strain was grown, the cell cycle stage of the 
cell and the subcellular compartment or structure from 
which a feature was extracted. Since yeast cells have a rela- 
tively simple ellipsoidal shape, the dimensions of the cells 
can serve as a proxy to determine cell cycle phase (16). In 
PhenoM, we categorized cells into one of four phases of 
the cell cycle according to the ratio of daughter cell area to 
mother cell area: (i) unbudded cell; (ii) small budded cell; 
(iii) medium budded cell; and (iv) large budded cell 
(detailed information is provided on PhenoM website). 
Gene annotations were downloaded from SGD (17), 
while GO annotations were downloaded from the GO 



website (18) and were customized according to the gene 
set within the database. 

DATA MINING 

The current PhenoM database contains data for 775 ts 
mutants covering 491 yeast essential genes. For each 
mutant strain, up to 865 morphological parameters were 
catalogued in the database, corresponding to different 
temperatures, cellular compartments and cell cycle 
stages. For example, the following parameters describing 
the morphology of the spindle were quantified and stored 
in the database: area, perimeter, length and orientation. 
To aid biologists in extracting biologically useful informa- 
tion, we designed and implemented two data mining tools, 
PhenoDev and PhenoBlast, which summarize the 
phenotypic deviations of genes among isogenic cell popu- 
lations and quantify the morphological similarities among 
different mutants, respectively. 

Phenotypic deviation: PhenoDev 

PhenoDev is a collection of analytical tools that 
can help users evaluate phenotypic deviations for a 
given morphological parameter in response to condition 
changes. It consists of three modules: PhenoTempDev, 
PhenoMutaDev and PhenoCycleDev. We provide a 
detailed user manual and examples for using these tools 
on the PhenoM website; the tools have user-friendly inter- 
faces that have been extensively tested and optimized. 

The phenotypic temperature deviation (PhenoTemp 
Dev) module can help users identify ts strains that have 
significant changes in a specific morphological parameter 
when grown at the permissive or restrictive temperature 
(see Figure 2 for an example). For example, if a researcher 
is interested in asking whether a ts allele affects 'spindle 
length', he or she can use PhenoTempDev to extract two 
sets of measurements for 'spindle length' across the two 
cell populations. A statistical test then allows ranking of 
the ts alleles according to the significance of the difference 
between the tested strains. In essence, this tool can help 
researchers identify genes that are implicated in a particu- 
lar yeast morphological trait. Phenotypic mutation devi- 
ation (PhenoMutaDev) is intended to quantify the 
phenotypic deviation between the wild-type strain and a 
ts mutant strain for a selected set of morphological par- 
ameters at the same temperature. In contrast to 
PhenoTempDev, which detects morphological changes 
caused by temperature shift, PhenoMutaDev can 
identify mutants that have a significantly different pheno- 
type than wild-type cells. 

Finally, phenotypic cell cycle deviation (PhenoCycleDev) 
allows users to examine the phenotypic changes for a single 
mutant throughout the cell cycle. As described above, we 
categorized the cell cycle into four phases based on 
daughter bud and mother cell ratios. Our system queries 
three contiguous cell cycle transitions: (i) from unbudded to 
small budded cells (Gl-S phase); (ii) from small budded to 
medium budded cells (S-G2); and (iii) from medium 
budded to large budded cells (G2-M). For each cell cycle 
transition, measurements of morphological parameters are 
extracted and compared. The program then ranks all the 
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Gene Ontology Analysis Result 

The Bar Chart below is used to describe the significant shared GO terms (or parents of GO terms) for the top 50 alleles below. 
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PhenoTempDev Search Result 

Detailed information in Help function. 

The PhenoTempDev search contains 36 features belong to cell shape and (or) subcellular compartment(s). 

IncrDev : How many features of each mutant at the restrictive temperature (32 C) show increased deviances compared to 
the permissive temperature (26 C); 

DecrDev :How many features of each mutant at the restrictive temperature (32 C) show decreased deviances compared to 

the permissive temperature (26 C); 

TotalDev : sum of the increased and decreased deviances; 

NonDev : How many features of each mutant at the restrictive temperature (32 C) show non-deviance compared to the 
permissive temperature (26 C); 



EtN Alleles) NonDev IncrDev DecrDev TotalDev 

1 rho1-td 0 2 34 36 

2 ADE5.7 0 2 34 36 

3 mcm2-1 0 2 34 36 

4 CCJC24-4 0 2 34 36 

5 cmd1-8 0 2 34 36 



Figure 2. An example of the 'PhenoTempDev Detail' interface. The top five alleles were presented in the figure. 



mutants according to changes in morphological measure- 
ments associated with the three transitions. With this tool, 
for any essential mutant, users can readily determine which 
phenotypic features are affected during a particular cell 
cycle phase transition. 

PhenoBlast 

The rationale for using morphological data to study gene 
function stems from our recent observation that genes 
that are annotated to have similar functions tend to 
give rise to similar morphological defects upon perturb- 
ation (6). The huge volume of morphology data generated 
from high-content screens and catalogued in the PhenoM 
database offers a unique and useful resource to search for 
alleles that show similar morphological phenotypes and 
thus have similar cellular functions. 

In contrast to searching databases for genes that have 
sequence similarities, searching for morphological 
similarities between any pair of yeast strains is computa- 
tionally and algorithmically more challenging. In an 
earlier study on phenotypic changes in Caenorhabditis 
elegans in RNAi interference experiments, Gunsalus and 



colleagues developed a software tool called PhenoBlast, in 
an analogy to BLAST, where gene pairs are scored based 
on the similarities in morphological phenotypes caused by 
knockdown of gene expression by RNAi. With this tool, 
47 phenotypic parameters were used to assess morpho- 
logical phenotypes in Caenorhabditis elegans (19). We 
adopted PhenoBlast for comparing yeast morphology 
data. In our algorithm, we do not directly compare the 
morphology between two different mutant strains. 
Instead, for each mutant strain, the morphological meas- 
urements of the population of mutant cells were first 
compared against the population of wild-type cells. We 
then used Wilcoxon rank-sum test to determine whether 
a particular morphological parameter (e.g. spindle length) 
is significantly increased (labelled as '+1'), unchanged 
(labelled as '0') or significantly decreased (labelled as 
'— 1') in the mutant strain. In this way, we obtain a 
vector of 865 values consisting of '+F, '0' or '—1' for 
each mutant strain, which describes the overall morpho- 
logical change in comparison to wild-type cells. We can 
then quantify the morphological similarities between any 
two mutant strains by calculating the similarities between 
their corresponding vectors (for detailed information, 
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please see the PhenoM server). The algorithm described 
above has been implemented and optimized in 
PhenoBlast. Users can query a single mutant of interest 
and see other mutants with similar phenotypic profiles in 
real time. Figure 3 shows an example of a PhenoBlast 
comparison which used default settings to query the 
smc4-l strain. The search identified the mcdl-73 strain, 
which contains a ts allele in the MCD1 gene as the top 
hit — the smc4-l and mcdl-73 strains shared similarity in 
621 morphological parameters [122 parameters signifi- 
cantly increased, 435 unchanged and 64 significantly 
decreased (Figure 4)]. 

Search 

The search engine of the system provides two options, 
Quick Search and Morphology Search. In Quick Search, 
users can directly query any mutant of interest from the 
database by specifying its gene name, open reading frame 
(ORF) name, or aliases. A partial-word-match method is 
also incorporated in the Quick Search. On the other hand, 
if users want to find mutant strains that share morpho- 
logical similarity with a pre-specified gene, a search by 
morphology profiles can be performed. The Morphology 
Search utility is very flexible as it allows users to conduct 
specific advanced searches including: (i) searching for 
mutants in a certain cell cycle phase; (ii) searching for 
mutants grown at a certain temperature; (iii) searching 



for mutants with a number of objects in a single cell in a 
particular compartment; (iv) searching for mutants in a 
certain range of a particular cell shape parameter. All of 
these options are unified in the Morphology Search. 

GO enrichment analysis 

Depending on the users' objective, running PhenoDev or 
PhenoBlast automatically generates a list of mutants 
showing phenotypic abnormality. We also provide a 
module for GO enrichment analysis to test whether 
genes associated with some phenotypic defects are 
enriched for known functions or pathways. This function- 
ality is particularly useful when gauging the biological sig- 
nificance of some highly ranked genes whose mutation 
causes extreme phenotypic defects. To implement GO 
analysis, we compared several tools such as GoMiner 
(20), GOEAST (21) and GO::TermFinder (22). We 
chose to use the GO::TermFinder package (v0.86) for 
PhenoM since it uses a modular design and thus was 
easy to integrate into our system. 

DATABASE IMPLEMENTATION AND USER 
INTERFACE 

Database implementation 

We designed and implemented PhenoM using Struts- 
Spring-Hibernate, a lightweight enterprise-level 
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Figure 3. An example of the 'Search' interface of PhenoBlast. 
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PhenoBlast Detail 



Gene Ontology Analysis Result 

The Bar Chart below is used to describe the significant shared GO terms (or parents of GO terms) for the top 50 alleles below. 
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PhenoBlast Search Result 



Query Allele : smc4-1 

The PhenoBlast search contains 865 features belong to cell shape and (or) subcellular compartment. 
Total alleles in the database used for comparison = 773. 
Detailed information in Help function. 
WT : wild-type. 

Nid: number of features both significantly increased in query and enlisted mutant 
Ndd : number of features both significantly decreased in query and enlisted mutant. 
Nad : Nad = Nid + Ndd 

Na : number of features both unchanged in query and enlisted mutant. 
All : All = Nad + Na. 
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Figure 4. An example of the 'PhenoBlast Detail' interface. The interface includes GO analysis result section and the 'Search Result' section. Smc4-1 
is used as the query allele and the top 5 and 23-25 alleles were presented in the figure. 



developmental framework (23,24). MySQL (v5.0.77) was 
used as the underlying relational database to store the 
large volume of quantitative measurements and the links 
to the raw morphological images in the file system. At the 
front end, Apache Server 2.0 is deployed to handle user 
requests and to forward requests for Java servlets 
and Java Server Pages to Apache Tomcat 6.0. To 
provide a user-friendly interface, JavaScript and AJAX 
(Asynchronous Javascript and XML) technology were 
adopted to program the client-side functionality and 
Apache TilesTM was used to create reusable interface 
components. For online graphical visualization and 
image manipulation, ImageJ (vl.44) was incorporated to 
assist users with morphological image processing via Java 
applets, and the Java open source package JFreeChart 
(http://www.jfree.org/jfreechart/) was used to plot bar 



charts. All source code development was performed in 
Eclipse v3.4.2 (http://www.eclipse.org). 

User interface 

The PhenoM server provides the following primary web 
interfaces, which allow users to perform focused query 
and analysis tasks. 

The 'Allele detail interface' presents detailed informa- 
tion for a mutant strain. It consists of three parts: 

(i) Basic information and cross links to other databases 
such as SGD (17) or BioGRID (25); 

(ii) Phenotypic images and statistical information for 
each of the six subcellular compartments or struc- 
tures. Several functional links are also provided, 
such as the option to download an original 
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micrograph in TIFF format, viewing an image 
through Image J, and also a link to the dataset for 
a particular image. For ease of analysis, all images 
are annotated by names that correspond to the 
mutant name, the reporter imaged, the temperature 
at which the images were captured and the site 
number (which indicates the set of independent 
images used to produce a particular image for a 
mutant) and 

(hi) The PhenoBlast results list, containing up to fifty 
other mutants that share similar morphological 
profiles (Figures 1 and 4). 

PhenoDev: this is a series of tools to measure pheno- 
typic deviations of strains carrying different ts alleles 
under varying conditions. We take PhenoTempDev as 
an example to navigate the interfaces of this function 
(Figure 2). Multiple options, such as morphological 
features in a certain compartment, bud size and 
others are provided and can be customized. An ana- 
lysis with the default parameters will lead to the 
PhenoTempDev results interface (Figure 2), which 
includes two sections: (i) the results of GO analysis and 
(ii) the deviation details of the top 50 mutants listed in 
descending order. 

PhenoBlast: this is a comparison tool for phenotyp- 
ic similarity and can be accessed either from the home 
interface of PhenoM or from the navigation bar. To 
execute PhenoBlast, both a query mutant and the 
relevant morphological features must be specified. Upon 
typing characters in the query allele box, the system will 
automatically suggest a list of the most similar mutant 
names. To facilitate a customized search, the list of 865 
morphological features is divided into three parts: (i) 386 
features from phenotypic mutation deviation; (ii) 193 
features from phenotypic temperature deviation; and 
(hi) 286 features from phenotypic cell cycle deviation 
(detailed information is provided on the PhenoM 
server). At least one feature must be chosen to conduct 
the analysis. The PhenoBlast results interface will be dis- 
played following the execution of a successful query and 
contains two parts: (i) the GO analysis results and (ii) the 
details of 50 other mutants that have similar phenotypes. 
Figures 3 and 4 shows an example of navigating through 
PhenoBlast (Figures 3 and 4). The SMC4 gene encodes a 
member of a ubiquitous family of chromosome-associated 
ATPases (26). The Smc4 protein is a subunit of the 
condensin complex which is required for chromosome 
condensation and dynamics (27,28). To search for 
mutants sharing similar morphological profiles with the 
smc4-l mutant strain, we can input smc4-l in the query 
allele box in the Search interface of PhenoBlast (Figure 3) 
and then press the button 'Run PhenoBlasf using default 
parameters. The system automatically leads to the Detail 
interface of PhenoBlast. In the search result section, the 
first, fifth and twenty fourth ranked mutants, mcdl-73, 
smcl-2 and smc3-l, respectively, all carry mutant alleles 
in genes encoding condensin subunits and the list of 
mutants overall is enriched in both DNA metabolic 
process (P = 0.01218) and sister chromatid cohesion 



(P = 0.04455) (Figure 4). The results from PhenoBlast 
are thus consistent with known biology (6,29-31), and 
we believe that it will become a useful tool for suggesting 
new biological gene associations for the yeast research 
community. 

Download: PhenoM allows users to download either 
data for the full set of all mutants or from a single 
selected mutant in tab-delimited format. 

AVAILABILITY AND FUTURE DIRECTION 

PhenoM can be freely accessed at http://phenom.ccbr. 
utoronto.ca/. The future development of PhenoM will 
include the following aspects: (i) PhenoM will include 
phenotypic information for additional yeast strains 
carrying mutant alleles of non-essential or essential 
genes as such data become available from our groups or 
elsewhere; (ii) data for more subcellular compartments or 
structures will be incorporated into the database to help 
biologists uncover the association between cell morph- 
ology and gene function; and (iii) more data mining and 
statistical analysis tools will be developed, especially tools 
that can integrate other types of data such as protein 
complexes or protein interactions. 

Our database ultimately aims to provide a centralized 
platform for the analysis of gene function by using quan- 
titative measurements and morphological images from 
yeast to mammalian cells. To accomplish this goal, 
PhenoM will provide a submission system to collect data 
from other image-based HCS experiments in the future. 
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